The Misuse of the NASA Metrics Data Program Data Sets for Automated Software Defect Prediction

نویسندگان

David Gray

David Bowes

Neil Davey

Yi Sun

Bruce Christianson

چکیده

Background: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. Aim: To demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. Method: A meticulously documented data cleansing process involving all 13 of the original NASA data sets. Results: Post our novel data cleansing process; each of the data sets had between 6 to 90 percent less of their original number of recorded values. Conclusions: One: Researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Two: Defect prediction data sets could benefit from lower level code metrics in addition to those more commonly used, as these will help to distinguish modules, reducing the likelihood of repeated data points. Three: The bulk of defect prediction experiments based on the NASA Metrics Data Program data sets may have led to erroneous findings. This is mainly due to repeated data points potentially causing substantial amounts of training and testing data to be identical.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New findings on the use of static code attributes for defect prediction Muhammed

Defect prediction includes tasks that are based on methods gener ated using software fault data sets and requires much effort to be completed. In defect prediction, although there are methods to conduct an analysis involving the classification of data sets and localisation of defects, those methods are not sufficient without eliminating repeated data points. The NASA Metrics Data Program (Nasa ...

متن کامل

Software Defect Prediction Based on Competitive Organization CoEvolutionary Algorithm

In order to improve the accuracy of prediction for software defect data sets, competitive organization coevolutionary algorithm is presented and applied for dealing with the software defect data. During this algorithm, mechanism of competition is introduced into coevolutionary algorithm. Then leagues are formed based on the importance of attributes among them. And three evolution operators whic...

متن کامل

Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics

The automated detection of defective modules within software systems could lead to reduced development costs and more reliable software. In this work the static code metrics for a collection of modules contained within eleven NASA data sets are used with a Support Vector Machine classifier. A rigorous sequence of pre-processing steps were applied to the data prior to classification, including t...

متن کامل

Evaluation of Classifiers in Software Fault-Proneness Prediction

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...

متن کامل

Software defect prediction using static code metrics : formulating a methodology

Software defect prediction is motivated by the huge costs incurred as a result of software failures. In an effort to reduce these costs, researchers have been utilising software metrics to try and build predictive models capable of locating the most defect-prone parts of a system. These areas can then be subject to some form of further analysis, such as a manual code review. It is hoped that su...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

The Misuse of the NASA Metrics Data Program Data Sets for Automated Software Defect Prediction

نویسندگان

چکیده

منابع مشابه

New findings on the use of static code attributes for defect prediction Muhammed

Software Defect Prediction Based on Competitive Organization CoEvolutionary Algorithm

Using the Support Vector Machine as a Classification Method for Software Defect Prediction with Static Code Metrics

Evaluation of Classifiers in Software Fault-Proneness Prediction

Software defect prediction using static code metrics : formulating a methodology

عنوان ژورنال:

اشتراک گذاری